Search CORE

42 research outputs found

The PETfold and PETcofold web servers for intra- and intermolecular structures of multiple RNA sequences

Author: Arthur
Gorodkin
Havgaard
Havgaard
J. Gorodkin
Knudsen
Kolb
Mattick
P. Menzel
R. Backofen
S. E. Seemann
Schneider
Seemann
Will
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

The function of non-coding RNA genes largely depends on their secondary structure and the interaction with other molecules. Thus, an accurate prediction of secondary structure and RNA–RNA interaction is essential for the understanding of biological roles and pathways associated with a specific RNA gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold

Crossref

PubMed Central

Copenhagen University Research Information System

Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy

Author: AF Bompfunewerer
Alex Liu
B Langmead
B Ma
D Sankoff
E Rivas
E Torarinsson
J Buhler
J Buhler
JH Havgaard
JH Havgaard
Jikai Lei
JS Pedersen
KC Pang
Osama Aljawad
P Gardner
P Schattner
PG Higgs
R Klein
R Li
S Chikkagoudar
S Griffiths-Jones
S Schwartz
S Washietl
SF Altschul
T Coenye
Y Sun
Y Sun
Yanni Sun
ZJ Lu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Author: A Löytynoja
A Löytynoja
B Sipos
BG Hall
BG Hall
BP Blackburne
C Chothia
C Dessimoz
C Kemena
C Kemena
C Notredame
CB Do
CL Strope
DA Dalquen
DA Morrison
DH Mathews
ER Mardis
G Blackshields
G Jordan
G Landan
GP Raghava
I Walle Van
J Kim
J Stoye
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JD Thompson
JH Havgaard
JP Huelsenbeck
K Mizuguchi
LA Stebbings
M Anisimova
M Pop
MR Aniba
P Gardner
RA Cartwright
RB Russell
RC Edgar
RC Edgar
SA Berger
SF Altschul
T Golubchik
T Koestler
T Lassmann
T Lassmann
T Lassmann
W Fletcher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/11/2012
Field of study

Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

arXiv.org e-Print Archive

Crossref

UCL Discovery

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Author: A Harmanci
A Harmanci
A Taneda
A Uzilov
A Uzilov
A Wilm
Akito Taneda
B Knudsen
C Lu
C Notredame
C Notredame
C Selig
CC Chang
CMA Davis Jr
D Dalli
D Dalli
D Rose
D Sankoff
DE Goldberg
E Rivas
E Rivas
E Torarinsson
E Torarinsson
E Torarinsson
F Miura
G Gonsalvez
H Kiryu
H Kiryu
I Hofacker
I Hofacker
I Holmes
J Cherry
J Gorodkin
J Havgaard
J Havgaard
J Pedersen
J Schultz
J Thompson
J Thompson
K Katoh
K Missal
K Missal
L David
M Bauer
M Gerstein
M Samanta
P Carninci
R Dowell
R Klein
R Nussinov
S Needleman
S Washietl
S Washietl
S Washietl
S Will
W Gish
X Xu
Y Tabei
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Simultaneous alignment and folding of protein sequences

Author: A. Caprara
B.E. Shakhnovich
C.B. Do
C.B. Do
D. Frishman
D. Sankoff
D.H. Mathews
G. Raghava
I.L. Hofacker
J. Selbig
J. Waldispuhl
J. Waldispuhl
J.H. Havgaard
L.R. Forrest
M. Brudno
M. Cline
M. Lomize
M. Menke
P. Bradley
P. Fariselli
P. Rice
R. Backofen
R. Doolittle
R.A. Sutormin
R.C. Edgar
R.C. Edgar
R.C. Edgar
R.L.J. Dunbrack
S. Henikoff
S. Will
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We presentpartiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm’s complexity is polynomial in time and space. Algorithmically,partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments,partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at http://partiFold.csail.mit.edu

CiteSeerX

DSpace@MIT

Crossref

Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Author: Amelia B Bellamy-Royds
B Masoumi
D Gautheret
D Mathews
D Sankoff
DH Mathews
DH Mathews
G Pavesi
G Storz
IL Hofacker
IL Hofacker
J Felsenstein
J Gorodkin
JD Thompson
JH Havgaard
JJ Cannone
KJ Doshi
M Anwar
M Sprinzl
M Zuker
M Zuker
M Zuker
Marcel Turcotte
P Rice
PP Gardner
PP Gardner
R Gutell
RD Dowell
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides. Results The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure. Conclusion We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

Author: A Wilm
A Wilm
AO Harmanci
AS Schwartz
B Paten
Byung-Jun Yoon
C Do
C Notredame
CB Do
CB Do
CB Do
D Dalli
D Sankoff
DH Mathews
DH Mathews
FF Costa
G Storz
H Kiryu
H Kiryu
I Holmes
IL Hofacker
IL Hofacker
IL Hofacker
J Gorodkin
JH Havgaard
JH Havgaard
JS McCaskill
K Katoh
M Anwar
M Bauer
M Hamada
M Hamada
R Durbin
RD Dowell
RK Bradley
RK Bradley
S Griffiths-Jones
S Lindgreen
S Moretti
S Siebert
S Wang
S Washietl
S Will
Sayed Mohammad Ebrahim Sahraeian
SM Sahraeian
SR Eddy
U Roshan
X Xu
Y Tabei
ZJ Lu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: <url>http://www.ece.tamu.edu/~bjyoon/picxaa/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Porcine transcriptome analysis based on 97 non-normalized cDNA libraries and assembly of 1,021,891 expressed sequence tags.

RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.BACKGROUND: Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. RESULTS: Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. CONCLUSION: This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.Published versio

Crossref

Springer - Publisher Connector

PubMed Central

Copenhagen University Research Information System

Apollo (Cambridge)

University of Southern Denmark Research Output

Online Research Database In Technology

ScholarBank@NUS

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

Author: AV Uzilov
B Gulko
B Knudsen
B Knudsen
B Morgenstern
D Sankoff
DH Mathews
DH Mathews
DH Mathews
DKY Chiu
DS Fields
E Rivas
G Storz
I Holmes
I Holmes
I Holmes
IL Hofacker
IL Hofacker
IL Hofacker
J Gorodkin
J Gorodkin
J Gorodkin
J Reeder
J Wuyts
J Wuyts
JE Hopcroft
JE Tabaska
JH Havgaard
M Zuker
M Zuker
MS Waterman
NR Pace
O Perriquet
PP Gardner
R Durbin
R Giegerich
R Green
R Lück
R Nussinov
RD Dowell
RD Dowell
Robin D Dowell
RR Gutell
RR Gutell
RR Gutell
S Batzoglou
S Griffiths-Jones
Sean R Eddy
SR Eddy
SV Muse
V Juan
VR Akmaev
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. RESULTS: We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. CONCLUSION: Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

Author: A Wilm
B Knudsen
BW Matthews
C Notredame
C Notredame
C Notredame
CB Do
CB Do
D Dalli
DF Feng
DH Mathews
E Torarinsson
GJ Barton
H Kiryu
H Kiryu
H Zhou
Hiroyuki Toh
I Holmes
IL Hofacker
J Heringa
J Pei
J Reeder
JD Thompson
JD Thompson
JH Havgaard
JS McCaskill
JS Papadopoulos
K Katoh
K Katoh
Kazutaka Katoh
M Bauer
M Hochsmann
M Kruspe
M Vingron
MA Larkin
MM Yusupov
O Gotoh
O Gotoh
PP Gardner
PP Gardner
S Griffiths-Jones
S Lindgreen
S Washietl
SB Needleman
SR Eddy
TF Smith
VA Simossis
X Xu
Y Tabei
Y Tabei
YM Hou
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized. Results We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage. Conclusion The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central